arXivrl 504.0545 lv2 [cs.CV] 22 Apr 2015 


Adaptive Compressive Tracking via Online Vector 
Boosting Feature Selection 


Qingshan Liu, Jing Yang, Kaihua Zhang*, Yi Wu 

Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of 
Information Science and Technology, Nanjing, China 


Abstract 

Recently, the compressive tracking (CT) method [T] has attracted much at¬ 
tention due to its high efficiency, but it cannot well deal with the large scale 
target appearance variations due to its data-independent random projection 
matrix that results in less discriminative features. To address this issue, in 
this paper we propose an adaptive CT approach, which selects the most dis¬ 
criminative features to design an effective appearance model. Our method 
significantly improves CT in three aspects: Firstly, the most discriminative 
features are selected via an online vector boosting method. Secondly, the 
object representation is updated in an effective online manner, which pre¬ 
serves the stable features while filtering out the noisy ones. Finally, a simple 
and effective trajectory rectification approach is adopted that can make the 
estimated location more accurate. Extensive experiments on the CVPR2013 
tracking benchmark demonstrate the superior performance of our algorithm 
compared over state-of-the-art tracking algorithms. 
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1. Introduction 

Object tracking is a fundamental problem in computer vision with numer¬ 
ous applications such as motion analysis, surveillance, autonomous robots,etc, 
and much process has been witnessed in recent years [ 2 ]. However, it remains 
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a challenging task due to the factors like illumination changes, partial occlu¬ 
sion, deformation, as well as viewpoint variation, to name a few [3]. To well 
handle these factors, an effective appearance model is of great importance, in 
which numerous design schemes have been proposed iSEiEiiziiHiiniiiniiiiiii], 
which can be categorized into either generative models or discriminative ones. 

Generative models learn an appearance model with the object informa¬ 
tion, which is used to search for the object with the minimum reconstruction 
error within a certain region. Adam et al. ra represent the target appearance 
with the intensity histograms of multiple fragments that can be efficiently 
computed by integral images. Ross et al. [13] present a tracking method that 
incrementally learns a low-dimensional subspace representation, which can 
effectively adapt to the target appearance changes. Mei and Ling [Tl| treat 
tracking as a sparse representation problem, in which the target location is 
determined by solving an li minimization problem. Bao et al. ng utilize 
the accelerated proximal gradient approach to efficiently solve the mini¬ 
mization problem for visual tracking. In [E| , Kwon and Lee propose a visual 
tracking decomposition method that combines multiple observation and mo¬ 
tion models for robust visual tracking, which has been further extended to 
search for appropriate trackers by the Markov Chain Monte Carlo sampling 
method [3] . Zhang et al. [I| formulate the tracking task as a multi-task sparse 
learning problem. In [Hj, Jia et al. formulate the object appearance as sparse 
codings of the local image patches with their spatial layout. In [7|, Zhong 
et al. propose a collaborative tracking algorithm that combines a sparsity- 
based discriminative classifier and a sparsity-based generative model. Wang 
et al. HE] present a least soft-threshold squares algorithm that models the 
image noise with the Gaussian-Laplacian distribution. 

Discriminative models learn a binary classifier to distinguish the target 
from its surrounding background. Avidan HZ] first proposes to utilize a sup¬ 
port vector machine classifier for visual tracking. In [T8| , an online discrimi¬ 
native feature selection technique is proposed to extract the most discrimina¬ 
tive features to separate the target from the background. Grabner et al. [3| 
proposes an online boosting method to select features for visual tracking. 
Babenko et al. [10] formulate the tracking task as a multiple instance learn¬ 
ing (MIL) problem, and propose an online MIL boosting method that selects 
features to design an appearance model. Zhang and Song ng further extend 
the MIL tracking method by considering the sample importance. Kalal et 
al. ffU integrate a re-detection module into tracking that can restart track¬ 
ing after the target reappears when it is completely occluded or missing from 
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the scene. Hare et al. [20] exploit the constraints of the predicted outputs 
with a kernelized structured SVM classifier, which achieves favorable results 
on the CVPR2013 tracking benchmark EH. Henriques et al. [22| propose a 
fast tracking algorithm which explores the circulant structure of the kernel 
matrix in the SVM classifier that can be efficiently computed by the fast 
Fourier transform algorithm. Zhang et al. [1] propose a real-time compres¬ 
sive tracking (CT) algorithm that employs a very sparse random matrix to 
achieve a low-dimensional image representation. In [23| Zhang et al. further 
reduce the computational complexity of CT with a coarse-to-fine strategy. 
Song [25 improves the performance of CT by introducing informative feature 
selection strategy. 

Recently, Wu et al. [ 21 ] release the CVPR2013 tracking benchmark, which 
contains 50 challenging sequences (~ 26000 frames), most of which suffer 
large scale target appearance variations. Results of 29 tracking algorithms 
are reported including most above mentioned tracking algorithms. Although 
CT is very efficient that runs over 60 frames per second (FPS), its success rate 
of one-pass evaluation (OPE) is only 30.6%. We claim that the unfavorable 
performance of CT is due to its data-independent random projection matrix 
that can only yield fixed feature templates, which cannot adapt the large scale 
target appearance variations well. In this paper, we propose an adaptive CT 
method which selects the most discriminative patches to construct the fea¬ 
ture templates via a novel online vector boosting method. Furthermore, we 
adopt a new model update mechanism that can preserve the stable features 
while avoiding the noisy ones, thereby effectively alleviating the drift prob¬ 
lem caused by online model update. Finally, we propose a very simple tra¬ 
jectory rectification method that makes the finally estimated location more 
accurate. Extensive evaluations on the CVPR2013 tracking benchmark [ 21 ] 
demonstrate the proposed algorithm performs favorably against state-of-art 
methods in terms of efficiency, accuracy and robustness, and especially the 
proposed algorithm outperforms CT by a large margin (the success rate of 
OPE of the proposed method is 50.4% vs. 30.6% for CT). 

The key contributions of the proposed algorithm are summarized as fol¬ 
lows: 

• To the best of our knowledge, it is the first time to explore online vector 
boosting [2H] feature selection method for visual tracking, in which the 
selected features can well adapt to the target appearance variations. 

• A novel trajectory rectification strategy is proposed that can be readily 
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Figure 1: The main components of CT 

extended to other tracking algorithms to ensure more accurate and 
stable tracking results. 

• Our tracker achieves favorable results with 50.4% in success plots and 
71.4% in precision plots, which ranks 1st in the CVPR2013 tracking 
benchmark [21], showing the power of the simple Haar-like features. 


2. Compressive Tracking 

The idea of CT [T| is motivated by the compressive sensing theory [2Sl EZj 
in which the random projections of a sufficiently high dimensional feature vec¬ 
tor contain enough information to reconstruct the original high-dimensional 
one. The main components are illustrated by Figure [T} First, each sample 
is represented by a high-dimensional multiscale vector via convolving each 
patch with some rectangle filters. Then, the vector is projected onto a low¬ 
dimensional space with a very sparse random projection matrix that satisfies 
the restricted isometry property (RIP) of the compressive sensing theory. 
The original high-dimensional feature vectors can well discriminate the tar¬ 
get from its local background while the high efficiency is achieved by the very 
sparse random matrix, and thereby CT performs well on some challenging 
sequences in terms of both efficiency and accuracy. 

CT employs a very sparse random matrix R G to project the 

high-dimensional vector x onto a low-dimensional vector v G M” 

V = Rx, (1) 
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Figure 2: Each compressed feature is constructed by several feature templates 

where the entry of R is represented by 

{ 1 with probability ^ 

0 with probability 1 — ^ (2) 

— 1 with probability 

where p = with w and h representing width and height of the object 

size, respectively. 

In 0, the f-th feature n* in the compressed vector v can be represented 
as 

(3) 

3 

Figure illustrates that Uj in 0 is constructed by several feature tem¬ 
plates, whose sizes and locations are set randomly and fixed during tracking. 
Although this random selection strategy is simple and efficient, it has some 
limitations that makes CT perform unfavorably when the target appearance 
varies much: First, the feature templates may select noninformative features 
when they fall into the textureless regions. Second, the fixed templates can¬ 
not adapt to the target appearance variations well. In the following section, 
we will propose an adaptive CT that can deal with these issues well. 

3. Adaptive Compressive Tracking 

3.1. Algorithm overview 

Figure [^illustrates the basic flow of our tracking algorithm that is sum¬ 
marized in Algorithm which mainly consists of three steps. First, we 
construct a set of positive and negative feature template bags {Bf 
in which each bag Bi = contains n rectangle feature templates, of 

which each template Zjj represents a vectorized image patch inside the blue 
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Figure 3: Flowchart of ACT. OVB is short for online vector boosting (refer to Algorithm]^. 


rectangle, and then we select k templates via an online vector boosting fea- 
tnre selection strategy, which constructs the feature template bags % Bi. 
Second, to take into account the target appearance variations over time, we 
exploit an online template update strategy that preserves the stable feature 
templates while replacing the ones with remarkable changes by a linear com¬ 
bination of former and current templates. Finally, when the score of the 
maximum classifier confidence for the estimated tracking location is lower 
than a threshold 0, which indicates that the estimation is inaccurate, then 
we employ a trajectory rectification strategy that utilizes the former tracking 
motion information to predict the current tracking location. 

3.2. Online vector boosting feature selection 

As illustrated by Figure the templates in CT [T] are constructed by 
several patches with random locations and sizes, of which the size of each 
patch ranges from 1 x 1 to w x h pixels, where w and h represent width and 
height of the object, respectively. However, a too small patch is vulnerable to 
the noisy small appearance variations while a too large one cannot distinguish 
the most likely target from its neighboring counterparts due to its large 
support. To handle this problem, we constrain the width and height of 
the feature template by 2 < < round{w/2), 2 < t/j. < round{h/2). 

Furthermore, to take into account multiscale appearance information, we set 
the patches in the same bag to the same size while the patches in different 
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Algorithm 1 Adaptive Compressive Tracking (ACT) 


Input: The t-th image frame 


1 : 


Sample a set of image patches in D'^ = {p|||lt(p) — lf_i|| < 7 } where lt_i 
is the tracking location at the {t — l)-th frame, and extract features with 
the feature template T 

Apply the classiher H{-) in (15) to each feature vector, and get the max¬ 
imum confidence score conf 
if conf < 0 then 

Rectify the tracking location If via (18) 
else 

Find the tracking location If via maximizing the classification score 

end if 

Sample two sets of image patches = {p|||lf(p) — lf|| < C} S’lid = 


{p\a < ||lf(p) — Itll < /d} with f < a < /3 
9: Update the feature template bags B and the classifier parameters 
Output: Tracking location If 


bags own varying sizes. Next, we will introduce our OVB feature selection 
method that can select the most discriminative feature templates from each 
bag to design a strong classifier. 

As illustrated by Figure providing the positive and negative feature 
template bags = 1,... ,c, we define a margin between them that 

is the sum of Euclidean distance between the average positive and negative 
feature vectors in each template 


C 

margin = marginj, (4) 

1=1 


where margini is defined as 


margini = \Zf — Z- \l 


= \/Z+"^Z+ - Z+"^Zr - Zr"^Z+ Zr~^Z~ 


\ 


2 n -2 




(5) 
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Figure 4: Illustration of the defined margin for the features in the i-th bag Bi. 
denote the width and height of the rectangle template in the i-th bag, respectively. Z^, 
denote the corresponding image representations of positive and negative samples, respec¬ 
tively, in which ^ and ^ denote the feature templates that are the normalized 

image patch vectors in the blue rectangles. Z^ and Z^ denote the average image repre¬ 
sentations of the positive and negative samples, respectively. 

where and denote the j-th normalized feature templates in the i-th 
bag of the positive and negative samples, respectively. 

It is easy to verify that z^Tz^^- so we have 


margirii = 





-T- 




Z > 

U — 







j=i i=i 


«^(Za + ) • • • ) +Zm)) 


( 6 ) 

where J is the lower bound of the margin function margirii. For each sample 
p, its image representation in the i-th bag is Bi{p) = {zij(j))}^^i, and we uti¬ 
lize the template center bag to robustly represent each class as Bi = 

(See Figure |^. Our objective is to select a subsect of feature templates 
from bag Bi that maximizes the margin function margirii, which 
can be readily achieved by maximizing its lower bound J' 


{zii ,..., Zifc} = argmax{^_ (zii+, • • •, +Zik). (7) 

The vector boosting algorithm in [25] relies on the special case of the ex¬ 
ponential loss function of AdaBoost, and thus cannot be readily adapted to 
solve the above problem. Now, we present the proposed novel online vector 
boosting algorithm that can readily address the above problem. Our method 
is motivated by the algorithm in [2H| that takes the statistical view of boost¬ 
ing, which tries to optimize a specific objective function C by sequentially 
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optimizing the following criterion 

{hj, aj) = arg+ ahj), ( 8 ) 

where i7j_i(p) : Q —?• E is a strong classifier that is the sum of the first 
j — 1 weak classifiers hi,i = 1 ,... ,j — 1 and H is the set of all possible weak 
classifiers. 

The proposed algorithm is an extension of the optimization problem in (|^ 
in which both the outputs of its weak classifiers and final output are vectors 
rather than scalars. At all time we maintain n candidate weak classifiers in 
which the j-th weak classifier is defined as 


h«(p) = Zij(p). 


(9) 


To update the classifier, we first update a subset of weak classifiers in parallel 
via an online feature template update strategy (refer to Section 3.3), and 
then we choose k weak classifiers hij from the candidate pool sequentially by 
maximizing the lower bound in ([^ 


hij = arg J + h^,), ( 10 ) 

where = '^jllhu. Algorithm summarizes the main steps of the 

proposed online vector boosting method. 

At last, we concatenate all the selected feature templates in all bags 
to yield a high-dimensional multiscale image representation x 

..., hj^,..., G E^^*=i We then utilize an orthogonal ma¬ 
trix S G Aid'll to project x onto a c-dimensional feature space 


V = Sx, 


( 11 ) 


where the entry of S is denoted as 

^ 1 ^ r 0 j <{i- l)kty,^tht,j > 

\/ktyj^thi I ±1 wilh equal probability, 

and the Ath entry of v is represented as 

(i—1) xc+fc+1 

Vi= ^ Sijsum{hij), 
j={i—l)xc+l 


( 12 ) 


(13) 
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Algorithm 2 Online Vector Boosting (OVB) 

Input: Feature templates j = 1,... ,n}. 

1. Update n weak classifiers hij,j = 1,... ,n according to the strategy 
introduced by Section |3^ 

2 . Initialize = 0 for all i,j 

3. for j = 1 to A: do 

4. for m = 1 to n do 

5. 

6. end for 

7. m* = argmax_77™ 

8. hij ^ h, * 

9 H = H + h 

10. end for 

Output: k selected feature templates {z^*,zr j = 1 ,..., A:} 

rrij rrij 


where sum{hij) can be efficiently computed by the integral images. 

3.3. Online feature template update 

CT [T] suffers drift when the target appearance changes much due to 
its fixed feature templates. In our algorithm, we proposes a conservative 
update scheme that only updates the templates with significant variations. 
Let Ajj = |hjj(pt) — hjj(pt_i )|2 denote the corresponding template variations 
between two consecutive frames. If Ajj < tA, we keep the template hjj, 
otherwise, we update the template 


hp(pt) = hhp(pt) + (1 - 


(14) 


where 77 > 0 represents the update ratio. 

3 . 4 . Online trajectory rectification 

Similar to CT [T], the tracking task is treated as a binary classification 
problem that the Naive Bayes classifier is adopted 



pjvily = +) 

p{vi\y = -) 


!)' 
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and the conditional distributions are assumed to be Gaussian distributed as 


p{vi\y = +) --A/'(/i+,(T+),p(ni|?/ = -) --,(t. ), (16) 

where /id and af are the mean and standard deviation of the i-th positive 
feature, respectively and similar to /i^ and . The parameters /id and af 
are incrementally update by 

pi ^ + (1 ~ _ 

cr+ ■«- A(cr+) + (1 - A)((T+)^ + A(1 - A)(/i+ - /i+) , 

where 0 < A < 1 is a learning parameter, cr+ = ^^ ~ 

and = ^ E£%=+ V i{k), n'^ is the number of positive samples. Param¬ 
eters /i^ and are updated with similar rules. 

When conf = max^H{v) < 0, which means the maximum classifier 
response is determined by the negative samples, the templates stop update. 
Then, we utilize the motion status in the former consecutive frames to predict 
the object location 

1* = h-At + (18) 

where ^ average motion velocity estimated from the former 

three frames, and At = 1 is the time step. 

4. Experiments 

4 . 1 . Setup 

Dataset: We evaluate the proposed algorithm on the CVPR2013 track¬ 
ing benchmark Eli that includes results of 29 tracking algorithms on 50 fully 
annotated sequences (~ 26000 frames). For better evaluation and analysis of 
the strength and weakness of the tracking algorithms, the sequences are cate¬ 
gorized according to 11 attributes, including illumination variation (IV), scale 
variation (SV), occlusion (OCC), deformation (DEF), motion blur (MB), fast 
motion (FM), in-plane rotation (IPR), out-of-plane rotation (OPR), out-of- 
view (OV), background clutters (BC), and low resolution (LR). 

Parameter setting: The number of bags is set to c = 150. The number 
of templates in each bag is set to n = 30, in which k = 5 templates are 
selected. Threshold of the classifier score 0 = 0. Threshold of appearance 
update is set to d = 100. The radius of searching window 7 = 25. The 
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radius of sampling positive samples a = 2, where n'^ = 40 positive samples 
are extracted. The inner radius of sampling negative samples C = 4 while 
its corresponding outer radius /3 = 15, where n~ = 40 negative samples are 
selected. The update ratio of feature template r] = 0.05, and the learning 
parameter of the classifier is set to A = 0.85. 

Evaluation metric: We employ the precision plot and success plot de¬ 
fined in to evaluate the robustness of the tracking algorithms. The pre¬ 
cision plot shows the percentage of frames whose estimated average center 
location errors are within the given threshold distance to the ground truth, 
in which the average center location error is defined as the average Euclidean 
distance between the center location of the tracked target and the manually 
labeled ground truth. The score at the threshold 20 pixels is defined as the 
precision score. Success plot shows the percentage of successful frames at 
the threshold ranging from 0 to 1. The successful frame is defined as the 
overlap score more than a fixed value, where the overlap ratio is defined as 
S = jj^* with the tracking output bounding box and the ground truth 
bounding box r^. For fair evaluation, the area under curve (AUC) is pre¬ 
ferred to measure the success ratio. The one-pass evaluation (OPE) based 
on the average precision and the success rate given the ground truth of the 
first frame is used to evaluate the robustness of our algorithm. 

4 -2. Quantitative comparisons 

Overall performance: Figure illustrates overall performance of the 
top 10 evaluated tracking algorithms (i.e., SCM |6], Struck [20], TLD [TT] . 
ASLA [2S|, CXT [301, VTS [0], VTD [H], CSK [22], LSK [31], and LOT [32]) 
and the CT algorithm [2S] in terms of precision plot and success plot. The 
proposed ACT ranks 1st in terms of both precision plot and success plot: the 
precision score of ACT is 0.714, which outperforms Struct [20] (0.656); mean¬ 
while, in the success plot, the proposed ACT achieves the score of AUC 0.504, 
which outperforms SCM [20] by 0.5%. Note that the proposed ACT exploits 
only simple Haar-like features to represent the object and background, in 
which the simple naive Bayesian classifier is adopted with low computational 
complexity, yet it outperforms Struct and SCM that resort to complicate 
learning techniques in terms of both accuracy and efficiency. 

Attribute-based performance: To facilitate analyzing strength and 
weakness of the proposed algorithm, we further evaluate ACT on videos 
with 11 attributes. Since the AUC score of the success plot is more accurate 
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Overlap threshold 


Precision plots of OPE 



Figure 5: The success plots and precision plots of OPE for the top 10 trackers and CT. 
The performance score for each tracker is shown in the legend. The performance score of 
precession plot is at the error threshold of 20 pixels while the performance score of success 
plot is the AUC value. Best viewed on color display. 


than that at the representative threshold (e.g., 20 pixels) of the precision 
plot, in the following we mainly analyze ACT based on the success plot. 

Figure shows that the success plots of videos with attributes that our 
method achieves favorable results, in which ACT ranks within top 2 on 8 
out of 11 attributes. For the videos with attributes such as FM, MB, IV, 
DEF, BC, IPR, and OPR, ACT ranks 1st among all evaluated algorithms. 
In sequences with FM and MB, Struct ranks 2rd, showing that the tracker 
with wide range search window and dense sampling can perform well on these 
attributes, and so does ACT that sets search window size based on target 
size which prevents wrongly updating classifier from distracters. In sequences 
with IV and OCC, both SCM and ACT perform favorably well because they 
employ local features, in which ACT exploits the Haar-like features from 
the target via templates with varying sizes while SCM learns the local patch 
features from the target and background with sparse representation. Further¬ 
more, both of them utilize the target template from the first frame, which is 
robust to drift problem. Similarly, on the OPR and IPR subsets, besides our 
tracker, the SCM and ASLSA trackers are also able to obtain satisfactory 
results, which can be attributed to the effective spare representations of local 
patches. 

Figure shows that ACT cannot perform well with three attributes, such 
as LR, OV, and SV. The low resolution makes ACT less effective to extract 
useful information from the target object. This can be improved by consider- 
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is plots of OPE - illumination variation (25) 


Success plots of OPE - occlusion (29) 





Success plots of OPE - out-of-plane rotation (39) 



Figure 6: The success plots of videos with different attributes that ACT can achieve 
favorable results (within the top 2). Best viewed on color display. 


ing the context information surrounding the target as [33! • Furthermore, the 
target appearance change drastically when OV occurs, and thus ACT cannot 
deal with these drastic variation favorably. However, the tracking failure in 
this case can be well reduced by memorizing information from some former 
frames and enlarging the search range. Finally, ACT only considers single 
scale tracking for simplicity, which can be readily extended to the multi-scale 
tracking by constructing scale pyramids, thereby improving performance on 
the sequences with SV attribute. 

4 . 3 . Qualitative comparisons 

Deformation: Figure shows the tracking results of three challenging 
sequences with deformation attributes. In the Basketball sequence, the target 
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Figure 7: The success plots of videos with different attributes that ACT cannot perform 
well (outside the top 2). Best viewed on color display. 



ACT - CT - SCM-Struct - TLD ASIA-CXT - VTS - VTD - CSK - LSK 


Figure 8; Qualitative results of the 11 trackers over sequences Basketball, Fleetface and 
Shaking from top to bottom (best viewed on high-resolution display). Object appearances 
therein change drastically with deformation. 

undergos great changes as the player runs around, especially interferences 
from other plays. We observe that Struct, CXT, and TLD drift once other 
players hide the target (e.g., #34). The SCM, ASLA, CT, CSK and LSK 
drift when the object appearance begins to vary (e.g., #472). VTD, VTS 
and our ACT method are able to track the target in the whole sequence 
successfully. Our tracker can deal with deformation well due to its online 
appearance update and trajectory rectification strategies. 

The Fleetface sequence suffers from scale changes, appearance varies, and 
background distraction when the object walks around the room. Many meth¬ 
ods fail to track the object when the object turns his head, which results in 
dramatically appearance changes. Challenges also come from the interference 
caused by bookshelf, because the color and texture information is similar to 
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- ACT - CT - SCM-Struct - TLD ASIA-CXT - VTS - VTD - CSK - LSK 


Figure 9: Qualitative results of the 11 trackers over sequences DavidS, Jogging and Subway 
from top to bottom (best viewed on high-resolution display). Object appearances therein 
changes drastically with heavy occlusion. 

object at that time. ASLA, Struct and our ACT methods achieve well per¬ 
formance on this sequence. 

In the Shaking sequence, the object undergoes both illumination change 
and pose variation. CSK, SCM, and VTD are able to track the object in this 
sequence, but with a lower success rate than our method. 

Heavy occlusion: The targets in the sequences of Figure undergo 
heavy occlusions from other objects. Furthermore, the targets in these se¬ 
quences suffer from severe pose variations when the pedestrians turn round. 
Both make these sequences much challenging. Overall, ACT shows favor¬ 
able performance to tackle these challenges, which attributes to the adaptive 
appearance model and online template update mechanism. When the con¬ 
fidence score of ACT decreases greatly to zero, the classifier and template 
stop updating, which can well prevent the tracker from drifting due to adding 
inaccurate samples. 

Illumination change: Figure[^shows the tracking results of three chal¬ 
lenging sequences where the targets suffer from drastic illumination changes. 
In the Car Dark sequence, a car runs along the street at night that suffers 
from large changes in environmental illumination and background clutters, 
and CT, TLD, CXT, and VTD drift gradually (e.g., ^287). In contrast, 
SCM, Struct, ASLA, VTS, CSK, LSK and our ACT achieve much better 
performance. For the Singer2 sequence, there is small contrast between the 
object and background besides illumination changes. Many trackers drift to 
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Figure 10; Qualitative results of the 11 trackers over sequences CarDark, Singer2 and 
Sylvester from top to bottom (best viewed on high-resolution display). Objects therein 
undergo illumination changes. 


the background at the beginning of this sequence when stage light changes 
drastically (e.g., #41). For the Sylvester sequence, challenges like IV, OPR, 
and IPR make it difficult to robustly track. Notwithstanding, our tracker 
achieve favorable performance due to its adaptive local appearance model. 

Other challenges: Figurepd]presents the tracking results in which many 
other challenges occur in these sequences, such as MB, FM, BC, SV, etc. In 
the Boy sequence, a boy jumps regularly, causing MB and SV in his face, 
making it difficult to track. Our ACT performs well in this sequence because 
of online feature template update. The target in the Deer sequence suffers 
from MB, FM and BC, our ACT works well due to its online trajectory 
rectification strategy that can prevent the model update from inaccurate 
samples. 


A# Analysis of ACT 

Online feature template update (OFTU): To verify the effective¬ 
ness of OFTU, we develop a tracker named ACT-OFTU that removes the 
component of OFTU in ACT. The quantitative results are illustrated in Fig¬ 
ure where we can observe that ACT achieves much better performance 


than ACT-OFTU. OFTU emphasizes the importance of object appearance 
variance over time, where the stable templates are preserved. Furthermore, 
the update part in the templates takes into account the appearance varia¬ 
tions, which can well adapt the target appearance variations over time. 
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Figure 11: Qualitative tracking results of the 11 trackers over sequences boy^ Deer from 
top to bottom (best viewed on high-resolution display. Objects therein undergo other 
challenges. 



Overlap threshold 


Precision plots of OPE 



Figure 12: The success plots and the precision plots for ACT and ACT-OFTU 


Online trajectory rectification (OTR): We design a tracker called 
ACT-OTR to justify the effectiveness of OTR in ACT. Figure [T^ illustrates 
the quantitative results, where ACT outperforms ACT-OTR by a large mar¬ 
gin, demonstrating the effectiveness of OTR that can well prevent the model 
update from inaccurate samples. 

Online vector boosting feature selection (OVBFS): To justify the 
effectiveness of OVBFS, we construct a tracker named CT-I-OFTU-I-OTR 
that replaces the OVBFS component in ACT with the compressive Haar-like 
features in CT [23] • The quantitative results are shown in Figure 14, where 


CT performs unfavorably due to the fact that the feature templates may 
select noninformative features when they fall into the textureless regions, 
but with the OFTU and OTR, CT improves its performance significantly. 
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Success plots of OPE 



Overlap threshold Location error threshold 


Precision plots of OPE 



Figure 13; The success plots and the precision plots for ACT and ACT-OTR 



Overlap threshold 


Precision plots of OPE 



Figure 14: The success plots and the precision plots for CT, CT+OFTU+OTR, and ACT 


which demonstrates the effectiveness of OFTU and OTR in ACT. 

5. Conclusions 

In this paper, we have proposed a novel adaptive compressive tracking 
algorithm that improves the CT algorithm [I] by a significantly large margin 
on the CVPR2013 tracking benchmark [21]. The proposed algorithm mainly 
includes three components: First, a novel vector boosting feature selection 
strategy has been proposed to design an effective appearance model; Sec¬ 
ond, a simple conservative model update strategy has been adopted that can 
preserve the stable information while filtering out the noisy appearance vari¬ 
ations during tracking; Third, a simple and effective trajectory rectification 
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strategy has been developed that can refine the tracking location when pos¬ 
sible inaccurate tracking occurs. Extensive evaluations on the CVPR2013 
tracking benchmark have demonstrated the superior performance of the pro¬ 
posed algorithm over other state-of-the-art tracking algorithms. 

References 

[1] Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang. Real-time compres¬ 
sive tracking. In Proceedings of European Conference on Computer Vi¬ 
sion, pages 864-877. 2012. 

[2] Xi Li, Weiming Hu, Chunhua Shen, Zhongfei Zhang, Anthony Dick, and 
Anton Van Den Hengel. A survey of appearance models in visual object 
tracking. ACM Transactions on Intelligent Systems and Technology, 
4(4):58, 2013. 

[3] Alper Yilmaz, Omar Javed, and Mubarak Shah. Object tracking: A 
survey. ACM Computing Surveys, 38(4): 13, 2006. 

[4] Helmut Grabner, Michael Grabner, and Horst Bischof. Real-time track¬ 
ing via on-line boosting. In Proceedings of British Machine Vision Con¬ 
ference, volume 1, page 6, 2006. 

[5] Helmut Grabner, Ghristian Leistner, and Horst Bischof. Semi-supervised 
on-line boosting for robust tracking. In Proceedings of European Con¬ 
ference on Computer Vision, pages 234-247. 2008. 

[6] Xu Jia, Huchuan Lu, and Ming-Hsuan Yang. Visual tracking via adap¬ 
tive structural local sparse appearance model. In Proceedings of IEEE 
Conference on Computer Vision and Pattern Recognition, pages 1822- 
1829, 2012. 

[7] Tianzhu Zhang, Bernard Ghanem, Si Liu, and Narendra Ahuja. Robust 
visual tracking via multi-task sparse learning. In Proceedings of IEEE 
Conference on Computer Vision and Pattern Recognition, pages 2042- 
2049, 2012. 

[8] Junseok Kwon and Kyoung Mu Lee. Visual tracking decomposition. 
In Proceedings of IEEE Conference on Computer Vision and Pattern 
Recognition, pages 1269-1276, 2010. 


20 



[9] Junseok Kwon and Kyoung Mu Lee. Tracking by sampling trackers. In 
Proceedings of the IEEE International Conference on Computer Vision, 
pages 1195-1202, 2011. 

[10] Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. Robust object 
tracking with online multiple instance learning. IEEE Transactions on 
Pattern Analysis and Machine Intelligence, 33(8): 1619-1632, 2011. 

[11] Zdenek Kalal, Jiri Matas, and Krystian Mikolajczyk. Pn learning: Boot¬ 
strapping binary classifiers by structural constraints. In Proceedings of 
IEEE Conference on Computer Vision and Pattern Recognition, pages 
49-56, 2010. 

[12] Amit Adam, Ehud Rivlin, and Ran Shimshoni. Robust fragments-based 
tracking using the integral histogram. In Proceedings of IEEE Con¬ 
ference on Computer Vision and Pattern Reeognition, volume 1, pages 
798-805, 2006. 

[13] David A Ross, Jongwoo Lim, Ruei-Sung Lin, and Ming-Hsuan Yang. 
Incremental learning for robust visual tracking. International Journal 
of Computer Vision, 77(1-3):125-141, 2008. 

[14] Xue Mei and Haibin Ling. Robust visual tracking using C minimiza¬ 
tion. In Proceedings of the IEEE International Conference on Computer 
Vision, pages 1436-1443, 2009. 

[15] Chenglong Bao, Yi Wu, Haibin Ling, and Hui Ji. Real time robust 11 
tracker using accelerated proximal gradient approach. In Proceedings of 
IEEE Conference on Computer Vision and Pattern Recognition, pages 
1830-1837, 2012. 

[16] Dong Wang, Huchuan Lu, and Ming-Hsuan Yang. Least soft-threshold 
squares tracking. In Proceedings of IEEE Conference on Computer Vi¬ 
sion and Pattern Recognition, pages 2371-2378, 2013. 

[17] Shai Avidan. Support vector tracking. IEEE Transactions on Pattern 
Analysis and Machine Intelligence, 26(8): 1064-1072, 2004. 

[18] Robert T Collins, Yanxi Liu, and Marius Leordeanu. Online selection of 
discriminative tracking features. IEEE Transactions on Pattern Analysis 
and Machine Intelligence, 27(10):1631-1643, 2005. 


21 



[19] Kaihua Zhang and Huihui Song. Real-time visual tracking via online 
weighted multiple instance learning. Pattern Recognition, 46(1) :397- 
411, 2013. 

[20] Sam Hare, Amir Saffari, and Philip HS Torr. Struck: Structured out¬ 
put tracking with kernels. In Proceedings of the IEEE International 
Conference on Computer Vision, pages 263-270, 2011. 

[21] Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. Online object tracking: 
A benchmark. In Proceedings of IEEE Conference on Computer Vision 
and Pattern Recognition, pages 2411-2418, 2013. 

[22] Joao F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. Ex¬ 
ploiting the circulant structure of tracking-by-detection with kernels. In 
Proceedings of European Conference on Computer Vision, pages 702- 
715. 2012. 

[23] Kaihua Zhang, Lei Zhang, and M-H Yang. Fast compressive track¬ 
ing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 
36(10):2002-2015, 2014. 

[24] Huihui Song. Robust visual tracking via online informative feature se¬ 
lection. Electronics Letters, 50(25): 1931-1933, 2014. 

[25] Chang Huang, Haizhou Ai, Yuan Li, and Shihong Lao. Vector boosting 
for rotation invariant multi-view face detection. In Proeeedings of the 
IEEE International Conference on Computer Vision, volume 1, pages 
446-453, 2005. 

[26] Emmanuel J Candes and Terence Tao. Decoding by linear programming. 
IEEE Transactions on Information Theory, 51(12):4203-4215, 2005. 

[27] Emmanuel J Candes and Terence Tao. Near-optimal signal recovery 
from random projections: Universal encoding strategies? IEEE Trans¬ 
actions on Information Theory, 52(12):5406-5425, 2006. 

[28] Jerome Friedman, Trevor Hastie, Robert Tibshirani, et al. Additive 
logistic regression: a statistical view of boosting. The annals of statistics, 
28(2):337-407, 2000. 


22 



[29] Wei Zhong, Huchuan Lu, and Ming-Hsuan Yang. Robust object tracking 
via sparsity-based collaborative model. In Proceedings of IEEE Confer¬ 
ence on Computer Vision and Pattern Recognition, pages 1838-1845, 
2012 . 

[30] Thang Ba Dinh, Nam Vo, and Gerard Medioni. Context tracker: Explor¬ 
ing supporters and distracters in unconstrained environments. In Pro¬ 
ceedings of IEEE Conference on Computer Vision and Pattern Recogni¬ 
tion, pages 1177-1184, 2011. 

[31] Baiyang Liu, Junzhou Huang, Lin Yang, and Casimir Kulikowsk. Robust 
tracking using local sparse appearance model and k-selection. In Proceed¬ 
ings of IEEE Conference on Computer Vision and Pattern Recognition, 
pages 1313-1320, 2011. 

[32] Shaul Oron, Aharon Bar-Hillel, Dan Levi, and Shai Avidan. Locally 
orderless tracking. In Proceedings of IEEE Conference on Computer 
Vision and Pattern Recognition, pages 1940-1947, 2012. 

[33] Kaihua Zhang, Lei Zhang, Qingshan Liu, David Zhang, and Ming-Hsuan 
Yang. Fast visual tracking via dense spatio-temporal context learning. 
In Proceedings of European Conference on Computer Vision, pages 127- 
141. 2014. 


23 



